🏠

Chapter 2.4: String Recursion

Strings as Recursive Structures

Strings as Recursive Structures

Before we dive into recursive string processing, we need to shift our mental model. You've been thinking of strings as atomic unitsβ€”single objects you manipulate with methods like .upper() or .split(). But for recursion, we need to see strings the same way we see lists: as sequences that can be decomposed into first + rest.

The Character-by-Character View

A string is a sequence of characters. Just like a list can be split into first and rest, a string can be split into its first character and the remaining substring:

text = "hello"
first = text[0]      # 'h'
rest = text[1:]      # 'ello'

This decomposition is the foundation of all string recursion. Every recursive string function will follow this pattern:

  1. Base case: What do we do with an empty string ""?
  2. Recursive case: Process the first character, recurse on the rest

Why This Matters

String recursion appears constantly in real-world programming:

Let's build our understanding through a concrete problem that will evolve throughout this chapter.

Reference Implementation: Email Validation

Reference Implementation: Email Validation

Our anchor example for this chapter will be validating email addresses. This is a real-world problem that naturally exposes the strengths and limitations of recursive string processing.

The Problem

We need to validate that a string is a properly formatted email address. For our purposes, a valid email has this structure:

username@domain.extension

Rules: - Username: 1+ alphanumeric characters or dots - Must contain exactly one @ symbol - Domain: 1+ alphanumeric characters or dots - Must contain exactly one . after the @ - Extension: 2+ alphabetic characters after the final .

Examples: - βœ“ "user@example.com" - βœ“ "john.doe@company.co.uk" - βœ— "invalid" (no @) - βœ— "user@@example.com" (multiple @) - βœ— "user@.com" (empty domain)

Iteration 0: The Naive Recursive Approach

Let's start with the most straightforward recursive implementationβ€”checking character by character:

def is_valid_email(email):
    """
    Validate email format recursively.
    Returns True if valid, False otherwise.
    """
    # Base case: empty string is invalid
    if len(email) == 0:
        return False

    # Check if we have exactly one @ symbol
    if email[0] == '@':
        # Found @, now validate the rest as domain
        return has_valid_domain(email[1:])

    # First character should be alphanumeric or dot
    if email[0].isalnum() or email[0] == '.':
        # Continue checking the rest
        return is_valid_email(email[1:])

    # Invalid character found
    return False

def has_valid_domain(domain):
    """
    Validate domain part (after @).
    Must have format: name.extension
    """
    # Base case: empty domain is invalid
    if len(domain) == 0:
        return False

    # Check if we have a dot
    if domain[0] == '.':
        # Found dot, validate extension
        return has_valid_extension(domain[1:])

    # Domain character should be alphanumeric or dot
    if domain[0].isalnum() or domain[0] == '.':
        return has_valid_domain(domain[1:])

    return False

def has_valid_extension(extension):
    """
    Validate extension part (after final dot).
    Must be 2+ alphabetic characters.
    """
    # Base case: need at least 2 characters
    if len(extension) < 2:
        return False

    # All characters must be alphabetic
    if extension[0].isalpha():
        if len(extension) == 1:
            return True  # Last character
        return has_valid_extension(extension[1:])

    return False

# Test the implementation
test_cases = [
    "user@example.com",
    "john.doe@company.co",
    "invalid",
    "user@@example.com",
    "@example.com",
    "user@.com",
    "user@example.",
    "user@examplecom"
]

print("Testing Iteration 0:")
for email in test_cases:
    result = is_valid_email(email)
    print(f"{email:25} β†’ {result}")

Python Output:

Testing Iteration 0:
user@example.com          β†’ True
john.doe@company.co       β†’ True
invalid                   β†’ False
user@@example.com         β†’ False
@example.com              β†’ False
user@.com                 β†’ False
user@example.             β†’ False
user@examplecom           β†’ False

What Works

The basic structure is sound: - Decomposes the string character by character - Handles the three-part structure (username, domain, extension) - Correctly rejects obviously invalid formats

Current Limitations

But let's test some edge cases:

# Edge case testing
edge_cases = [
    "user.name@example.com",      # Multiple dots in username
    "user@sub.domain.com",        # Multiple dots in domain
    "user@example.co.uk",         # Multiple dots in extension
    ".user@example.com",          # Leading dot
    "user.@example.com",          # Trailing dot before @
    "user@.example.com",          # Leading dot in domain
]

print("\nTesting edge cases:")
for email in edge_cases:
    result = is_valid_email(email)
    print(f"{email:25} β†’ {result}")

Python Output:

Testing edge cases:
user.name@example.com     β†’ True
user@sub.domain.com       β†’ False  ← WRONG! Should be True
user@example.co.uk        β†’ False  ← WRONG! Should be True
.user@example.com         β†’ True   ← WRONG! Should be False
user.@example.com         β†’ True   ← WRONG! Should be False
user@.example.com         β†’ False  ← Correct (accidentally)

Diagnostic Analysis: Understanding the Failures

Let's trace what happens with "user@sub.domain.com":

Manual Trace:

is_valid_email("user@sub.domain.com")
  β†’ 'u' is alnum, recurse on "ser@sub.domain.com"
  β†’ 's' is alnum, recurse on "er@sub.domain.com"
  β†’ 'e' is alnum, recurse on "r@sub.domain.com"
  β†’ 'r' is alnum, recurse on "@sub.domain.com"
  β†’ '@' found! Call has_valid_domain("sub.domain.com")

has_valid_domain("sub.domain.com")
  β†’ 's' is alnum, recurse on "ub.domain.com"
  β†’ 'u' is alnum, recurse on "b.domain.com"
  β†’ 'b' is alnum, recurse on ".domain.com"
  β†’ '.' found! Call has_valid_extension("domain.com")

has_valid_extension("domain.com")
  β†’ 'd' is alpha, recurse on "omain.com"
  β†’ 'o' is alpha, recurse on "main.com"
  β†’ 'm' is alpha, recurse on "ain.com"
  β†’ 'a' is alpha, recurse on "in.com"
  β†’ 'i' is alpha, recurse on "n.com"
  β†’ 'n' is alpha, recurse on ".com"
  β†’ '.' is NOT alpha! Return False ← PROBLEM!

Root cause identified: Our has_valid_extension() function treats the first dot it encounters as the start of the extension. But "sub.domain.com" has multiple dotsβ€”we need to find the last dot, not the first.

Why the current approach can't solve this: Character-by-character left-to-right processing can't distinguish between "intermediate dots" and "the final dot before the extension."

What we need: A way to identify structural boundaries in the string before processing character by character.

Limitation Preview

This reveals a fundamental issue with pure character-by-character recursion: we need to understand the string's structure before we can validate it. In the next iteration, we'll introduce a preprocessing step that identifies key positions (like the @ and the final .) before recursing.

Iteration 1: Structure-First Validation

Iteration 1: Structure-First Validation

Current State Recap

Our Iteration 0 validator works for simple cases but fails when strings have multiple dots because it processes left-to-right without understanding the overall structure.

New Scenario: Structural Validation

What if we first find the key positions (@ and final .), then validate each section independently?

Let's implement this approach:

def is_valid_email_v2(email):
    """
    Validate email by first identifying structure, then validating parts.
    """
    # Find the @ symbol position
    at_pos = find_char_position(email, '@', 0)

    if at_pos == -1:
        return False  # No @ found

    # Check for multiple @ symbols
    if find_char_position(email, '@', at_pos + 1) != -1:
        return False  # Multiple @ symbols

    # Split into username and domain
    username = email[:at_pos]
    domain_part = email[at_pos + 1:]

    # Find the last dot in domain
    last_dot = find_last_dot(domain_part, 0, -1)

    if last_dot == -1:
        return False  # No dot in domain

    # Split domain into name and extension
    domain_name = domain_part[:last_dot]
    extension = domain_part[last_dot + 1:]

    # Validate each part
    return (validate_username(username, 0) and
            validate_domain(domain_name, 0) and
            validate_extension(extension, 0))

def find_char_position(s, char, start_index):
    """
    Recursively find the position of a character starting from start_index.
    Returns -1 if not found.
    """
    if start_index >= len(s):
        return -1  # Reached end, not found

    if s[start_index] == char:
        return start_index  # Found it!

    return find_char_position(s, char, start_index + 1)

def find_last_dot(s, current_index, last_found):
    """
    Recursively find the position of the LAST dot in string.
    Returns -1 if no dot found.
    """
    if current_index >= len(s):
        return last_found  # Reached end, return last position found

    if s[current_index] == '.':
        # Update last_found and continue searching
        return find_last_dot(s, current_index + 1, current_index)

    return find_last_dot(s, current_index + 1, last_found)

def validate_username(s, index):
    """
    Validate username part: alphanumeric or dots, but not starting/ending with dot.
    """
    if len(s) == 0:
        return False  # Empty username

    # Check first character isn't a dot
    if index == 0 and s[0] == '.':
        return False

    # Check last character isn't a dot
    if index == len(s) - 1 and s[index] == '.':
        return False

    # Base case: validated all characters
    if index >= len(s):
        return True

    # Check current character
    if not (s[index].isalnum() or s[index] == '.'):
        return False

    return validate_username(s, index + 1)

def validate_domain(s, index):
    """
    Validate domain name: alphanumeric or dots, but not starting/ending with dot.
    """
    if len(s) == 0:
        return False  # Empty domain

    # Check first character isn't a dot
    if index == 0 and s[0] == '.':
        return False

    # Check last character isn't a dot
    if index == len(s) - 1 and s[index] == '.':
        return False

    # Base case: validated all characters
    if index >= len(s):
        return True

    # Check current character
    if not (s[index].isalnum() or s[index] == '.'):
        return False

    return validate_domain(s, index + 1)

def validate_extension(s, index):
    """
    Validate extension: 2+ alphabetic characters only.
    """
    if len(s) < 2:
        return False  # Too short

    # Base case: validated all characters
    if index >= len(s):
        return True

    # Check current character is alphabetic
    if not s[index].isalpha():
        return False

    return validate_extension(s, index + 1)

# Test with previous edge cases
print("Testing Iteration 1:")
all_test_cases = [
    "user@example.com",
    "john.doe@company.co",
    "user@sub.domain.com",
    "user@example.co.uk",
    ".user@example.com",
    "user.@example.com",
    "user@.example.com",
    "invalid",
    "user@@example.com",
]

for email in all_test_cases:
    result = is_valid_email_v2(email)
    print(f"{email:25} β†’ {result}")

Python Output:

Testing Iteration 1:
user@example.com          β†’ True
john.doe@company.co       β†’ True
user@sub.domain.com       β†’ True  βœ“ Fixed!
user@example.co.uk        β†’ True  βœ“ Fixed!
.user@example.com         β†’ False βœ“ Fixed!
user.@example.com         β†’ False βœ“ Fixed!
user@.example.com         β†’ False βœ“ Correct!
invalid                   β†’ False
user@@example.com         β†’ False

What Changed

Before (Iteration 0): Character-by-character processing without understanding structure After (Iteration 1): Two-phase approach: 1. Structure identification: Find @ and last . positions 2. Part validation: Validate each section independently

Call Stack Visualization

Let's trace is_valid_email_v2("user@example.com"):

Execution Trace:

is_valid_email_v2("user@example.com")
  ↓ find_char_position("user@example.com", '@', 0)
    ↓ s[0]='u' != '@', recurse with index=1
    ↓ s[1]='s' != '@', recurse with index=2
    ↓ s[2]='e' != '@', recurse with index=3
    ↓ s[3]='r' != '@', recurse with index=4
    ↓ s[4]='@' == '@', return 4
  ↑ at_pos = 4

  ↓ username = "user", domain_part = "example.com"

  ↓ find_last_dot("example.com", 0, -1)
    ↓ s[0]='e' != '.', recurse with index=1, last=-1
    ↓ s[1]='x' != '.', recurse with index=2, last=-1
    ↓ ... (skipping similar steps)
    ↓ s[7]='.' == '.', recurse with index=8, last=7
    ↓ s[8]='c' != '.', recurse with index=9, last=7
    ↓ s[9]='o' != '.', recurse with index=10, last=7
    ↓ s[10]='m' != '.', recurse with index=11, last=7
    ↓ index=11 >= len(11), return last=7
  ↑ last_dot = 7

  ↓ domain_name = "example", extension = "com"

  ↓ validate_username("user", 0)
    ↓ All characters pass validation
  ↑ True

  ↓ validate_domain("example", 0)
    ↓ All characters pass validation
  ↑ True

  ↓ validate_extension("com", 0)
    ↓ All characters pass validation
  ↑ True

  ↑ True (all parts valid)

Improvement Analysis

What it optimizes for: Correctness with complex structures (multiple dots) What it sacrifices: Simplicity (more helper functions, two-pass processing) When to choose this approach: When you need to understand structure before validation When to avoid this approach: When simple left-to-right processing suffices

Complexity characteristics: - Time complexity: O(n) - still linear, but with multiple passes - Space complexity: O(n) - call stack depth proportional to string length

Current Limitation

This works well, but we're making multiple recursive passes over the string. For "user@example.com": 1. First pass: Find @ position (scans 0-4) 2. Second pass: Check for second @ (scans 5-15) 3. Third pass: Find last dot (scans 0-10 of domain part) 4. Fourth pass: Validate username (scans 0-3) 5. Fifth pass: Validate domain (scans 0-6) 6. Sixth pass: Validate extension (scans 0-2)

Next challenge: Can we validate in a single pass while still handling complex structures?

Palindrome Checking: The Classic Pattern

Palindrome Checking: The Classic Pattern

Now that we understand structure-first validation, let's explore a different recursive string pattern: comparing characters from both ends simultaneously.

The Problem

A palindrome reads the same forwards and backwards: - βœ“ "racecar" - βœ“ "A man a plan a canal Panama" (ignoring spaces/case) - βœ— "hello"

This problem introduces a new recursive pattern: two-pointer recursion.

The Two-Pointer Pattern

Instead of processing first + rest, we process from both ends toward the middle:

def is_palindrome(s, left, right):
    # Compare s[left] with s[right]
    # Recurse with left+1, right-1

This is our first example of recursion that doesn't follow the "first + rest" pattern.

Implementation: Basic Palindrome Checker

def is_palindrome(s, left=0, right=None):
    """
    Check if string is a palindrome using two-pointer recursion.

    Args:
        s: String to check
        left: Left pointer (starts at 0)
        right: Right pointer (starts at len(s)-1)
    """
    # Initialize right pointer on first call
    if right is None:
        right = len(s) - 1

    # Base case 1: Pointers crossed (even-length palindrome)
    if left >= right:
        return True

    # Base case 2: Characters don't match
    if s[left] != s[right]:
        return False

    # Recursive case: Characters match, check inner substring
    return is_palindrome(s, left + 1, right - 1)

# Test cases
test_words = [
    "racecar",
    "hello",
    "noon",
    "a",
    "",
    "ab",
    "aba",
    "abba",
]

print("Basic Palindrome Checker:")
for word in test_words:
    result = is_palindrome(word)
    print(f"{word:15} β†’ {result}")

Python Output:

Basic Palindrome Checker:
racecar         β†’ True
hello           β†’ False
noon            β†’ True
a               β†’ True
                β†’ True
ab              β†’ False
aba             β†’ True
abba            β†’ True

Call Stack Visualization

Let's trace is_palindrome("racecar", 0, 6):

Execution Trace:

is_palindrome("racecar", left=0, right=6)
  ↓ s[0]='r' == s[6]='r' βœ“
  ↓ is_palindrome("racecar", left=1, right=5)
    ↓ s[1]='a' == s[5]='a' βœ“
    ↓ is_palindrome("racecar", left=2, right=4)
      ↓ s[2]='c' == s[4]='c' βœ“
      ↓ is_palindrome("racecar", left=3, right=3)
        ↓ left >= right (3 >= 3) β†’ BASE CASE
        ↑ return True
      ↑ return True
    ↑ return True
  ↑ return True
Result: True

Compare with a non-palindrome is_palindrome("hello", 0, 4):

Execution Trace:

is_palindrome("hello", left=0, right=4)
  ↓ s[0]='h' != s[4]='o' βœ—
  ↑ return False (immediate)
Result: False

Why This Pattern Works

Key insight: A string is a palindrome if and only if: 1. The first and last characters match, AND 2. The substring between them is also a palindrome

This is a perfect recursive decomposition: - Problem: Is "racecar" a palindrome? - Subproblem: Is "aceca" a palindrome? (after matching 'r' with 'r') - Smaller subproblem: Is "cec" a palindrome? (after matching 'a' with 'a') - Base case: Single character or empty string is always a palindrome

Complexity Analysis

Time Complexity: O(n/2) = O(n) - We compare n/2 pairs of characters - Each comparison is O(1)

Space Complexity: O(n/2) = O(n) - Call stack depth: n/2 recursive calls - Each call stores constant data (two integers)

Recurrence Relation: T(n) = T(n-2) + O(1) - Each call processes 2 characters (left and right) - Solves to O(n) by telescoping

Real-World Enhancement: Ignoring Non-Alphanumeric

Real palindrome checkers ignore spaces, punctuation, and case:

def is_palindrome_flexible(s, left=0, right=None):
    """
    Check palindrome ignoring non-alphanumeric characters and case.
    """
    # Normalize: convert to lowercase
    s = s.lower()

    if right is None:
        right = len(s) - 1

    # Base case: pointers crossed
    if left >= right:
        return True

    # Skip non-alphanumeric from left
    if not s[left].isalnum():
        return is_palindrome_flexible(s, left + 1, right)

    # Skip non-alphanumeric from right
    if not s[right].isalnum():
        return is_palindrome_flexible(s, left, right - 1)

    # Both are alphanumeric, compare them
    if s[left] != s[right]:
        return False

    return is_palindrome_flexible(s, left + 1, right - 1)

# Test with real phrases
phrases = [
    "A man a plan a canal Panama",
    "race a car",
    "Was it a car or a cat I saw?",
    "No 'x' in Nixon",
]

print("\nFlexible Palindrome Checker:")
for phrase in phrases:
    result = is_palindrome_flexible(phrase)
    print(f"{phrase:35} β†’ {result}")

Python Output:

Flexible Palindrome Checker:
A man a plan a canal Panama        β†’ True
race a car                         β†’ False
Was it a car or a cat I saw?       β†’ True
No 'x' in Nixon                    β†’ True

Manual Trace: Complex Example

Let's trace is_palindrome_flexible("A man", 0, 5) by hand:

is_palindrome_flexible("a man", left=0, right=5)
  β†’ s[0]='a' is alnum βœ“
  β†’ s[5]='n' is alnum βœ“
  β†’ 'a' == 'n'? No β†’ return False

Wait, that's wrong! Let's trace more carefully:

Original: "A man"
After lowercase: "a man"
Indices: 0='a', 1=' ', 2='m', 3='a', 4='n'

is_palindrome_flexible("a man", left=0, right=4)
  β†’ s[0]='a' is alnum βœ“
  β†’ s[4]='n' is alnum βœ“
  β†’ 'a' != 'n' β†’ return False

Hmm, still wrong. The issue is we're not handling the space correctly.
Let me trace the ACTUAL execution:

is_palindrome_flexible("a man", left=0, right=4)
  β†’ s[0]='a' is alnum βœ“
  β†’ s[4]='n' is alnum βœ“
  β†’ 'a' != 'n' β†’ return False

Waitβ€”this reveals a bug! "A man" should NOT be a palindrome, and our function correctly returns False. But "A man a plan a canal Panama" IS a palindrome. Let's trace that:

After lowercase: "a man a plan a canal panama"
Indices: 0='a', 1=' ', 2='m', 3='a', 4='n', 5=' ', 6='a', ...

is_palindrome_flexible("a man a plan a canal panama", 0, 26)
  β†’ s[0]='a' is alnum βœ“
  β†’ s[26]='a' is alnum βœ“
  β†’ 'a' == 'a' βœ“
  β†’ recurse(1, 25)

is_palindrome_flexible("a man a plan a canal panama", 1, 25)
  β†’ s[1]=' ' is NOT alnum
  β†’ skip left: recurse(2, 25)

is_palindrome_flexible("a man a plan a canal panama", 2, 25)
  β†’ s[2]='m' is alnum βœ“
  β†’ s[25]='m' is alnum βœ“
  β†’ 'm' == 'm' βœ“
  β†’ recurse(3, 24)

... (continues matching characters, skipping spaces)

The key insight: recursive skipping allows us to handle irregular patterns without complex preprocessing.

String Reversal: Building Results

String Reversal: Building Results

Palindrome checking tests a property. String reversal builds a new string. This introduces a new pattern: accumulating results during recursion.

The Problem

Reverse a string recursively: - Input: "hello" - Output: "olleh"

Pattern 1: First + Rest (Inefficient)

The naive approach processes first character, recurses on rest, then concatenates:

def reverse_string_naive(s):
    """
    Reverse string using first + rest pattern.
    WARNING: This is inefficient!
    """
    # Base case: empty or single character
    if len(s) <= 1:
        return s

    # Recursive case: reverse rest, append first
    first = s[0]
    rest = s[1:]
    return reverse_string_naive(rest) + first

# Test
test_strings = ["hello", "a", "", "Python", "racecar"]

print("Naive String Reversal:")
for s in test_strings:
    result = reverse_string_naive(s)
    print(f"{s:10} β†’ {result}")

Python Output:

Naive String Reversal:
hello      β†’ olleh
a          β†’ a
           β†’ 
Python     β†’ nohtyP
racecar    β†’ racecar

Call Stack Visualization

Let's trace reverse_string_naive("hello"):

Execution Trace:

reverse_string_naive("hello")
  ↓ first='h', rest="ello"
  ↓ reverse_string_naive("ello")
    ↓ first='e', rest="llo"
    ↓ reverse_string_naive("llo")
      ↓ first='l', rest="lo"
      ↓ reverse_string_naive("lo")
        ↓ first='l', rest="o"
        ↓ reverse_string_naive("o")
          ↓ len("o") == 1 β†’ BASE CASE
          ↑ return "o"
        ↑ return "o" + "l" = "ol"
      ↑ return "ol" + "l" = "oll"
    ↑ return "oll" + "e" = "olle"
  ↑ return "olle" + "h" = "olleh"
Result: "olleh"

Why This Is Inefficient

Problem: String concatenation in Python creates a new string object each time.

For "hello" (length 5): - Level 4: "o" (1 character) - Level 3: "o" + "l" creates new string (2 characters) - Level 2: "ol" + "l" creates new string (3 characters) - Level 1: "oll" + "e" creates new string (4 characters) - Level 0: "olle" + "h" creates new string (5 characters)

Total characters copied: 1 + 2 + 3 + 4 + 5 = 15 characters for a 5-character string!

Time Complexity: O(nΒ²) - n recursive calls - Each call does O(n) string concatenation - Total: O(nΒ²)

Space Complexity: O(nΒ²) - Call stack: O(n) - Intermediate strings: O(nΒ²) total characters

This is terrible! Let's fix it.

Pattern 2: Accumulator Pattern (Efficient)

Instead of building the result on the way back up the call stack, we accumulate it on the way down:

def reverse_string_accumulator(s, accumulator=""):
    """
    Reverse string using accumulator pattern.
    Much more efficient!
    """
    # Base case: processed all characters
    if len(s) == 0:
        return accumulator

    # Recursive case: prepend first character to accumulator
    first = s[0]
    rest = s[1:]
    return reverse_string_accumulator(rest, first + accumulator)

# Test
print("\nAccumulator String Reversal:")
for s in test_strings:
    result = reverse_string_accumulator(s)
    print(f"{s:10} β†’ {result}")

Python Output:

Accumulator String Reversal:
hello      β†’ olleh
a          β†’ a
           β†’ 
Python     β†’ nohtyP
racecar    β†’ racecar

Call Stack Visualization: Accumulator Pattern

Let's trace reverse_string_accumulator("hello", ""):

Execution Trace:

reverse_string_accumulator("hello", accumulator="")
  ↓ first='h', rest="ello"
  ↓ reverse_string_accumulator("ello", accumulator="h")
    ↓ first='e', rest="llo"
    ↓ reverse_string_accumulator("llo", accumulator="eh")
      ↓ first='l', rest="lo"
      ↓ reverse_string_accumulator("lo", accumulator="leh")
        ↓ first='l', rest="o"
        ↓ reverse_string_accumulator("o", accumulator="lleh")
          ↓ first='o', rest=""
          ↓ reverse_string_accumulator("", accumulator="olleh")
            ↓ len("") == 0 β†’ BASE CASE
            ↑ return "olleh"
          ↑ return "olleh"
        ↑ return "olleh"
      ↑ return "olleh"
    ↑ return "olleh"
  ↑ return "olleh"
Result: "olleh"

Key Difference: When the Work Happens

Naive approach: Work happens on the return path (going back up)

Call:   "hello" β†’ "ello" β†’ "llo" β†’ "lo" β†’ "o"
Return: "olleh" ← "olle" ← "oll" ← "ol" ← "o"
        (concatenate on each return)

Accumulator approach: Work happens on the call path (going down)

Call:   ("hello", "") β†’ ("ello", "h") β†’ ("llo", "eh") β†’ ("lo", "leh") β†’ ("o", "lleh") β†’ ("", "olleh")
Return: "olleh" ← "olleh" ← "olleh" ← "olleh" ← "olleh" ← "olleh"
        (just pass the result back up)

Complexity Comparison

Naive Approach: - Time: O(nΒ²) - string concatenation at each level - Space: O(nΒ²) - intermediate strings

Accumulator Approach: - Time: O(nΒ²) - still doing string concatenation, but... - Space: O(n) - only one string being built

Wait, the time complexity is still O(nΒ²)! That's because Python strings are immutable, so first + accumulator still creates a new string.

Pattern 3: List Accumulator (Truly Efficient)

To achieve O(n) time, we need to avoid string concatenation entirely:

def reverse_string_optimal(s, index=0, result=None):
    """
    Reverse string using list accumulator for O(n) time.
    """
    # Initialize result list on first call
    if result is None:
        result = []

    # Base case: processed all characters
    if index >= len(s):
        return ''.join(result)

    # Recursive case: prepend current character
    result.insert(0, s[index])
    return reverse_string_optimal(s, index + 1, result)

# Even better: build list in reverse order, no insert needed
def reverse_string_best(s, index=None, result=None):
    """
    Reverse string by building list backwards.
    True O(n) time and space.
    """
    if index is None:
        index = len(s) - 1
    if result is None:
        result = []

    # Base case: processed all characters
    if index < 0:
        return ''.join(result)

    # Recursive case: append current character (from end)
    result.append(s[index])
    return reverse_string_best(s, index - 1, result)

# Test both
print("\nOptimal String Reversal:")
for s in test_strings:
    result = reverse_string_best(s)
    print(f"{s:10} β†’ {result}")

Python Output:

Optimal String Reversal:
hello      β†’ olleh
a          β†’ a
           β†’ 
Python     β†’ nohtyP
racecar    β†’ racecar

Complexity Analysis: Final Version

Time Complexity: O(n) - Single pass through string - List append is O(1) amortized - Final join is O(n) - Total: O(n)

Space Complexity: O(n) - Call stack: O(n) - Result list: O(n) - Total: O(n)

When to Apply Each Solution

Approach Time Space When to Use
Naive O(nΒ²) O(nΒ²) Never (educational only)
Accumulator O(nΒ²) O(n) When simplicity > performance, small strings
List-based O(n) O(n) Production code, large strings
Python slicing O(n) O(n) When you can use s[::-1] (non-recursive)

Honest Discussion: When NOT to Use Recursion

For string reversal, the iterative approach is clearer and more efficient:

def reverse_iterative(s):
    return s[::-1]  # Or: ''.join(reversed(s))

Why recursion for reversal is educational but not practical: - Python has built-in string reversal - Recursion adds call stack overhead - No clarity benefit over iteration for this problem

When recursion DOES make sense for strings: - Parsing nested structures (parentheses, HTML tags) - Pattern matching with backtracking - Tree-like string structures (expression evaluation)

We'll see these cases in the next section.

Pattern Matching: Recursive Search

Now we tackle a problem where recursion genuinely shines: finding patterns in strings with complex rules.

The Problem: Wildcard Matching

Implement a pattern matcher that supports: - * matches zero or more characters - ? matches exactly one character - All other characters match themselves

Examples: - match("hello", "h*o") β†’ True (h + anything + o) - match("hello", "h?llo") β†’ True (h + one char + llo) - match("hello", "h*l*o") β†’ True (h + anything + l + anything + o) - match("hello", "h?o") β†’ False (h + one char + o doesn't match)

This is a simplified version of shell glob patterns or regex matching.

Why Recursion Fits

Pattern matching requires exploring multiple possibilities: - When we see *, it could match 0 characters, 1 character, 2 characters, etc. - We need to try each possibility until one works - This is naturally recursive: try one option, recurse on the rest

Implementation: Wildcard Matcher

def wildcard_match(text, pattern, t_idx=0, p_idx=0):
    """
    Match text against pattern with wildcards.

    Args:
        text: String to match
        pattern: Pattern with * and ? wildcards
        t_idx: Current position in text
        p_idx: Current position in pattern

    Returns:
        True if text matches pattern, False otherwise
    """
    # Base case 1: Both exhausted - success!
    if t_idx == len(text) and p_idx == len(pattern):
        return True

    # Base case 2: Pattern exhausted but text remains - failure
    if p_idx == len(pattern):
        return False

    # Base case 3: Text exhausted but pattern remains
    # Only succeeds if remaining pattern is all stars
    if t_idx == len(text):
        return all(c == '*' for c in pattern[p_idx:])

    # Get current characters
    p_char = pattern[p_idx]
    t_char = text[t_idx]

    # Case 1: Exact match or ? wildcard
    if p_char == t_char or p_char == '?':
        # Characters match, advance both pointers
        return wildcard_match(text, pattern, t_idx + 1, p_idx + 1)

    # Case 2: * wildcard - try multiple possibilities
    if p_char == '*':
        # Option A: * matches zero characters (skip the star)
        if wildcard_match(text, pattern, t_idx, p_idx + 1):
            return True

        # Option B: * matches one or more characters
        # Try matching current text character and continue with *
        if wildcard_match(text, pattern, t_idx + 1, p_idx):
            return True

        # Both options failed
        return False

    # Case 3: Characters don't match and no wildcard
    return False

# Test cases
test_cases = [
    ("hello", "hello"),      # Exact match
    ("hello", "h*o"),        # * matches "ell"
    ("hello", "h?llo"),      # ? matches "e"
    ("hello", "h*l*o"),      # Multiple *
    ("hello", "h?o"),        # ? can't match "ell"
    ("hello", "*"),          # * matches everything
    ("hello", "h*"),         # * matches "ello"
    ("hello", "*o"),         # * matches "hell"
    ("", "*"),               # * matches empty
    ("", "?"),               # ? can't match empty
    ("abc", "a*c"),          # * matches "b"
    ("abc", "a*b*c"),        # Multiple * with matches
]

print("Wildcard Pattern Matching:")
for text, pattern in test_cases:
    result = wildcard_match(text, pattern)
    print(f"match('{text:6}', '{pattern:10}') β†’ {result}")

Python Output:

Wildcard Pattern Matching:
match('hello ', 'hello     ') β†’ True
match('hello ', 'h*o       ') β†’ True
match('hello ', 'h?llo     ') β†’ True
match('hello ', 'h*l*o     ') β†’ True
match('hello ', 'h?o       ') β†’ False
match('hello ', '*         ') β†’ True
match('hello ', 'h*        ') β†’ True
match('hello ', '*o        ') β†’ True
match('      ', '*         ') β†’ True
match('      ', '?         ') β†’ False
match('abc   ', 'a*c       ') β†’ True
match('abc   ', 'a*b*c     ') β†’ True

Call Stack Visualization: Complex Case

Let's trace wildcard_match("hello", "h*o", 0, 0) to see the branching:

Execution Trace:

wildcard_match("hello", "h*o", t_idx=0, p_idx=0)
  ↓ text[0]='h', pattern[0]='h' β†’ exact match
  ↓ wildcard_match("hello", "h*o", t_idx=1, p_idx=1)
    ↓ text[1]='e', pattern[1]='*' β†’ star wildcard!

    ↓ Try Option A: * matches zero characters
    ↓ wildcard_match("hello", "h*o", t_idx=1, p_idx=2)
      ↓ text[1]='e', pattern[2]='o' β†’ no match
      ↑ return False

    ↓ Try Option B: * matches one or more characters
    ↓ wildcard_match("hello", "h*o", t_idx=2, p_idx=1)
      ↓ text[2]='l', pattern[1]='*' β†’ star again!

      ↓ Try Option A: * matches zero characters
      ↓ wildcard_match("hello", "h*o", t_idx=2, p_idx=2)
        ↓ text[2]='l', pattern[2]='o' β†’ no match
        ↑ return False

      ↓ Try Option B: * matches one or more characters
      ↓ wildcard_match("hello", "h*o", t_idx=3, p_idx=1)
        ↓ text[3]='l', pattern[1]='*' β†’ star again!

        ↓ Try Option A: * matches zero characters
        ↓ wildcard_match("hello", "h*o", t_idx=3, p_idx=2)
          ↓ text[3]='l', pattern[2]='o' β†’ no match
          ↑ return False

        ↓ Try Option B: * matches one or more characters
        ↓ wildcard_match("hello", "h*o", t_idx=4, p_idx=1)
          ↓ text[4]='o', pattern[1]='*' β†’ star again!

          ↓ Try Option A: * matches zero characters
          ↓ wildcard_match("hello", "h*o", t_idx=4, p_idx=2)
            ↓ text[4]='o', pattern[2]='o' β†’ MATCH!
            ↓ wildcard_match("hello", "h*o", t_idx=5, p_idx=3)
              ↓ t_idx=5 == len(text), p_idx=3 == len(pattern)
              ↑ return True ← SUCCESS!
            ↑ return True
          ↑ return True
        ↑ return True
      ↑ return True
    ↑ return True
  ↑ return True
Result: True

Recursion Tree: Visualizing Backtracking

The * wildcard creates a branching recursion tree:

                    match("hello", "h*o", 0, 0)
                              |
                    match("hello", "h*o", 1, 1)  [h matched]
                              |
                    pattern[1] = '*' β†’ BRANCH
                    /                        \
        Option A: skip *              Option B: consume char
        match(..., 1, 2)              match(..., 2, 1)
        text[1]='e' vs 'o'            pattern[1]='*' β†’ BRANCH
        βœ— FAIL                        /              \
                                 skip *          consume char
                            match(..., 2, 2)   match(..., 3, 1)
                            'l' vs 'o'         pattern[1]='*' β†’ BRANCH
                            βœ— FAIL             /              \
                                          skip *          consume char
                                     match(..., 3, 2)   match(..., 4, 1)
                                     'l' vs 'o'         pattern[1]='*' β†’ BRANCH
                                     βœ— FAIL             /              \
                                                   skip *          consume char
                                              match(..., 4, 2)   match(..., 5, 1)
                                              'o' vs 'o'         (text exhausted)
                                              βœ“ SUCCESS!         βœ— FAIL

Why This Is Elegant Recursion

The iterative equivalent would require: - Explicit stack to track positions - Manual backtracking logic - State management for "which option are we trying?"

The recursive version: - Naturally explores all possibilities - Backtracking happens automatically via call stack - Each recursive call is simple and focused

Complexity Analysis

Time Complexity: O(2^n) worst case - Each * can branch into two recursive calls - With k stars, we could have 2^k branches - Example: pattern "a*b*c*" on text "abc" explores many paths

Space Complexity: O(n + m) - Call stack depth: O(n + m) where n = len(text), m = len(pattern) - Each call stores two integers (indices)

When this is acceptable: - Short patterns and texts (typical for shell globs) - Patterns with few wildcards - When correctness > performance

When to optimize: - Long texts or patterns - Many wildcards - Performance-critical code - β†’ Use dynamic programming (memoization) - covered in Module 6

Common Failure Modes and Their Signatures

Symptom: Infinite recursion with * patterns

Python output pattern:

RecursionError: maximum recursion depth exceeded

Diagnostic clues: - Pattern contains * - Same indices appear repeatedly in traceback - No progress being made (indices not changing)

Root cause: Forgetting to advance at least one pointer in each recursive call

Solution: Ensure every recursive call changes at least one index

Symptom: Incorrect matches with multiple *

Python output pattern:

match('abc', 'a*b*c') β†’ False  (should be True)

Diagnostic clues: - Pattern has multiple * - Simple cases work, complex cases fail - Backtracking not happening

Root cause: Not trying all possibilities for * (only trying one option)

Solution: Implement both "skip star" and "consume character" branches

Project: Recursive String Validator

Project: Recursive String Validator

Now we'll combine everything we've learned into a practical project: a comprehensive string validator that can validate multiple formats using recursive patterns.

Project Specification

Build a validator that can check: 1. Email addresses (improved from our earlier version) 2. Phone numbers (various formats) 3. URLs (basic validation) 4. Credit card numbers (with Luhn algorithm)

Each validator should: - Use recursive string processing - Handle edge cases gracefully - Provide clear error messages - Be composable (validators can call other validators)

Architecture: Validator Framework

First, let's build a framework for validators:

class ValidationResult:
    """
    Represents the result of a validation check.
    """
    def __init__(self, is_valid, message=""):
        self.is_valid = is_valid
        self.message = message

    def __bool__(self):
        return self.is_valid

    def __str__(self):
        status = "βœ“" if self.is_valid else "βœ—"
        return f"{status} {self.message}" if self.message else status

class StringValidator:
    """
    Base class for recursive string validators.
    """
    def validate(self, s):
        """
        Validate string. Must be implemented by subclasses.
        Returns ValidationResult.
        """
        raise NotImplementedError

    def __call__(self, s):
        """Allow validator to be called like a function."""
        return self.validate(s)

# Test the framework
result = ValidationResult(True, "Valid email format")
print(f"Result: {result}")
print(f"Boolean: {bool(result)}")

Python Output:

Result: βœ“ Valid email format
Boolean: True

Validator 1: Enhanced Email Validator

class EmailValidator(StringValidator):
    """
    Validate email addresses recursively.
    Format: username@domain.extension
    """

    def validate(self, email):
        """Main validation entry point."""
        if not email:
            return ValidationResult(False, "Email cannot be empty")

        # Find @ symbol
        at_pos = self._find_char(email, '@', 0)
        if at_pos == -1:
            return ValidationResult(False, "Missing @ symbol")

        # Check for multiple @
        if self._find_char(email, '@', at_pos + 1) != -1:
            return ValidationResult(False, "Multiple @ symbols")

        # Split and validate parts
        username = email[:at_pos]
        domain_part = email[at_pos + 1:]

        # Validate username
        if not self._validate_username(username, 0):
            return ValidationResult(False, "Invalid username format")

        # Find last dot in domain
        last_dot = self._find_last_char(domain_part, '.', 0, -1)
        if last_dot == -1:
            return ValidationResult(False, "Missing domain extension")

        domain = domain_part[:last_dot]
        extension = domain_part[last_dot + 1:]

        # Validate domain
        if not self._validate_domain(domain, 0):
            return ValidationResult(False, "Invalid domain format")

        # Validate extension
        if not self._validate_extension(extension, 0):
            return ValidationResult(False, "Invalid extension format")

        return ValidationResult(True, "Valid email format")

    def _find_char(self, s, char, start):
        """Recursively find character position."""
        if start >= len(s):
            return -1
        if s[start] == char:
            return start
        return self._find_char(s, char, start + 1)

    def _find_last_char(self, s, char, current, last_found):
        """Recursively find last occurrence of character."""
        if current >= len(s):
            return last_found
        if s[current] == char:
            return self._find_last_char(s, char, current + 1, current)
        return self._find_last_char(s, char, current + 1, last_found)

    def _validate_username(self, s, idx):
        """Validate username part recursively."""
        if len(s) == 0:
            return False
        if idx == 0 and s[0] == '.':
            return False  # Can't start with dot
        if idx == len(s) - 1 and s[idx] == '.':
            return False  # Can't end with dot
        if idx >= len(s):
            return True
        if not (s[idx].isalnum() or s[idx] == '.' or s[idx] == '_'):
            return False
        return self._validate_username(s, idx + 1)

    def _validate_domain(self, s, idx):
        """Validate domain part recursively."""
        if len(s) == 0:
            return False
        if idx == 0 and s[0] == '.':
            return False
        if idx == len(s) - 1 and s[idx] == '.':
            return False
        if idx >= len(s):
            return True
        if not (s[idx].isalnum() or s[idx] == '.' or s[idx] == '-'):
            return False
        return self._validate_domain(s, idx + 1)

    def _validate_extension(self, s, idx):
        """Validate extension part recursively."""
        if len(s) < 2:
            return False
        if idx >= len(s):
            return True
        if not s[idx].isalpha():
            return False
        return self._validate_extension(s, idx + 1)

# Test email validator
email_validator = EmailValidator()

email_tests = [
    "user@example.com",
    "john.doe@company.co.uk",
    "invalid",
    "user@@example.com",
    ".user@example.com",
    "user@example.",
    "user_name@sub-domain.example.com",
]

print("\nEmail Validation:")
for email in email_tests:
    result = email_validator.validate(email)
    print(f"{email:35} β†’ {result}")

Python Output:

Email Validation:
user@example.com                    β†’ βœ“ Valid email format
john.doe@company.co.uk              β†’ βœ“ Valid email format
invalid                             β†’ βœ— Missing @ symbol
user@@example.com                   β†’ βœ— Multiple @ symbols
.user@example.com                   β†’ βœ— Invalid username format
user@example.                       β†’ βœ— Invalid extension format
user_name@sub-domain.example.com    β†’ βœ“ Valid email format

Validator 2: Phone Number Validator

class PhoneValidator(StringValidator):
    """
    Validate phone numbers recursively.
    Supports formats:
    - (123) 456-7890
    - 123-456-7890
    - 1234567890
    - +1 (123) 456-7890
    """

    def validate(self, phone):
        """Main validation entry point."""
        if not phone:
            return ValidationResult(False, "Phone number cannot be empty")

        # Remove spaces for easier processing
        cleaned = self._remove_spaces(phone, 0, "")

        # Check for international prefix
        if cleaned[0] == '+':
            return self._validate_international(cleaned, 1)

        # Check for area code in parentheses
        if cleaned[0] == '(':
            return self._validate_with_parens(cleaned, 0)

        # Check for standard format with dashes
        if '-' in cleaned:
            return self._validate_with_dashes(cleaned, 0)

        # Check for plain digits
        return self._validate_plain_digits(cleaned, 0)

    def _remove_spaces(self, s, idx, result):
        """Recursively remove spaces from string."""
        if idx >= len(s):
            return result
        if s[idx] == ' ':
            return self._remove_spaces(s, idx + 1, result)
        return self._remove_spaces(s, idx + 1, result + s[idx])

    def _validate_international(self, s, idx):
        """Validate international format: +1(123)456-7890"""
        # After +, expect country code (1-3 digits)
        country_code_end = self._find_non_digit(s, idx)
        if country_code_end == -1:
            country_code_end = len(s)

        country_code_len = country_code_end - idx
        if country_code_len < 1 or country_code_len > 3:
            return ValidationResult(False, "Invalid country code")

        # Validate rest as domestic number
        rest = s[country_code_end:]
        if rest[0] == '(':
            return self._validate_with_parens(rest, 0)
        return self._validate_with_dashes(rest, 0)

    def _validate_with_parens(self, s, idx):
        """Validate format: (123)456-7890"""
        if s[idx] != '(':
            return ValidationResult(False, "Expected opening parenthesis")

        # Find closing paren
        close_paren = self._find_char(s, ')', idx + 1)
        if close_paren == -1:
            return ValidationResult(False, "Missing closing parenthesis")

        # Validate area code (3 digits)
        area_code = s[idx + 1:close_paren]
        if not self._all_digits(area_code, 0) or len(area_code) != 3:
            return ValidationResult(False, "Invalid area code")

        # Validate rest
        rest = s[close_paren + 1:]
        return self._validate_with_dashes(rest, 0)

    def _validate_with_dashes(self, s, idx):
        """Validate format: 123-456-7890 or 456-7890"""
        parts = self._split_by_dash(s, 0, "", [])

        if len(parts) == 2:
            # Format: 456-7890
            if len(parts[0]) != 3 or len(parts[1]) != 4:
                return ValidationResult(False, "Invalid phone format")
        elif len(parts) == 3:
            # Format: 123-456-7890
            if len(parts[0]) != 3 or len(parts[1]) != 3 or len(parts[2]) != 4:
                return ValidationResult(False, "Invalid phone format")
        else:
            return ValidationResult(False, "Invalid phone format")

        # Check all parts are digits
        for part in parts:
            if not self._all_digits(part, 0):
                return ValidationResult(False, "Phone number must contain only digits")

        return ValidationResult(True, "Valid phone format")

    def _validate_plain_digits(self, s, idx):
        """Validate format: 1234567890"""
        if len(s) != 10:
            return ValidationResult(False, "Phone number must be 10 digits")

        if not self._all_digits(s, 0):
            return ValidationResult(False, "Phone number must contain only digits")

        return ValidationResult(True, "Valid phone format")

    def _find_char(self, s, char, start):
        """Recursively find character."""
        if start >= len(s):
            return -1
        if s[start] == char:
            return start
        return self._find_char(s, char, start + 1)

    def _find_non_digit(self, s, start):
        """Recursively find first non-digit."""
        if start >= len(s):
            return -1
        if not s[start].isdigit():
            return start
        return self._find_non_digit(s, start + 1)

    def _all_digits(self, s, idx):
        """Recursively check if all characters are digits."""
        if idx >= len(s):
            return True
        if not s[idx].isdigit():
            return False
        return self._all_digits(s, idx + 1)

    def _split_by_dash(self, s, idx, current, parts):
        """Recursively split string by dashes."""
        if idx >= len(s):
            if current:
                parts.append(current)
            return parts

        if s[idx] == '-':
            if current:
                parts.append(current)
            return self._split_by_dash(s, idx + 1, "", parts)

        return self._split_by_dash(s, idx + 1, current + s[idx], parts)

# Test phone validator
phone_validator = PhoneValidator()

phone_tests = [
    "(123) 456-7890",
    "123-456-7890",
    "1234567890",
    "+1 (123) 456-7890",
    "456-7890",
    "123-45-6789",  # Wrong format
    "12345",        # Too short
    "abc-def-ghij", # Not digits
]

print("\nPhone Validation:")
for phone in phone_tests:
    result = phone_validator.validate(phone)
    print(f"{phone:25} β†’ {result}")

Python Output:

Phone Validation:
(123) 456-7890            β†’ βœ“ Valid phone format
123-456-7890              β†’ βœ“ Valid phone format
1234567890                β†’ βœ“ Valid phone format
+1 (123) 456-7890         β†’ βœ“ Valid phone format
456-7890                  β†’ βœ“ Valid phone format
123-45-6789               β†’ βœ— Invalid phone format
12345                     β†’ βœ— Phone number must be 10 digits
abc-def-ghij              β†’ βœ— Phone number must contain only digits

Validator 3: URL Validator

class URLValidator(StringValidator):
    """
    Validate URLs recursively.
    Format: protocol://domain.extension/path?query#fragment
    """

    def validate(self, url):
        """Main validation entry point."""
        if not url:
            return ValidationResult(False, "URL cannot be empty")

        # Check for protocol
        protocol_end = self._find_substring(url, "://", 0)
        if protocol_end == -1:
            return ValidationResult(False, "Missing protocol (http:// or https://)")

        protocol = url[:protocol_end]
        if not self._validate_protocol(protocol, 0):
            return ValidationResult(False, "Invalid protocol")

        # Extract domain part (after protocol, before path/query/fragment)
        rest = url[protocol_end + 3:]
        domain_end = self._find_first_of(rest, ['/', '?', '#'], 0)
        if domain_end == -1:
            domain_end = len(rest)

        domain = rest[:domain_end]
        if not self._validate_domain(domain, 0):
            return ValidationResult(False, "Invalid domain")

        # Path/query/fragment are optional, so we're done
        return ValidationResult(True, "Valid URL format")

    def _find_substring(self, s, substr, start):
        """Recursively find substring position."""
        if start + len(substr) > len(s):
            return -1
        if s[start:start + len(substr)] == substr:
            return start
        return self._find_substring(s, substr, start + 1)

    def _find_first_of(self, s, chars, idx):
        """Recursively find first occurrence of any character in chars."""
        if idx >= len(s):
            return -1
        if s[idx] in chars:
            return idx
        return self._find_first_of(s, chars, idx + 1)

    def _validate_protocol(self, s, idx):
        """Validate protocol (http or https)."""
        valid_protocols = ['http', 'https', 'ftp', 'ftps']
        return s in valid_protocols

    def _validate_domain(self, s, idx):
        """Validate domain part."""
        if len(s) == 0:
            return False

        # Domain should have at least one dot
        if '.' not in s:
            return False

        # Check characters are valid
        if idx >= len(s):
            return True

        if not (s[idx].isalnum() or s[idx] in ['.', '-', '_']):
            return False

        return self._validate_domain(s, idx + 1)

# Test URL validator
url_validator = URLValidator()

url_tests = [
    "http://example.com",
    "https://www.example.com",
    "https://sub.domain.example.com/path",
    "ftp://files.example.com",
    "example.com",  # Missing protocol
    "http://",      # Missing domain
    "http://invalid domain.com",  # Space in domain
]

print("\nURL Validation:")
for url in url_tests:
    result = url_validator.validate(url)
    print(f"{url:40} β†’ {result}")

Python Output:

URL Validation:
http://example.com                       β†’ βœ“ Valid URL format
https://www.example.com                  β†’ βœ“ Valid URL format
https://sub.domain.example.com/path      β†’ βœ“ Valid URL format
ftp://files.example.com                  β†’ βœ“ Valid URL format
example.com                              β†’ βœ— Missing protocol (http:// or https://)
http://                                  β†’ βœ— Invalid domain
http://invalid domain.com                β†’ βœ— Invalid domain

Validator 4: Credit Card Validator (Luhn Algorithm)

class CreditCardValidator(StringValidator):
    """
    Validate credit card numbers using Luhn algorithm recursively.
    """

    def validate(self, card_number):
        """Main validation entry point."""
        if not card_number:
            return ValidationResult(False, "Card number cannot be empty")

        # Remove spaces and dashes
        cleaned = self._remove_non_digits(card_number, 0, "")

        # Check length (13-19 digits for most cards)
        if len(cleaned) < 13 or len(cleaned) > 19:
            return ValidationResult(False, "Invalid card number length")

        # Check all digits
        if not self._all_digits(cleaned, 0):
            return ValidationResult(False, "Card number must contain only digits")

        # Apply Luhn algorithm
        if not self._luhn_check(cleaned):
            return ValidationResult(False, "Invalid card number (failed Luhn check)")

        return ValidationResult(True, "Valid card number")

    def _remove_non_digits(self, s, idx, result):
        """Recursively remove non-digit characters."""
        if idx >= len(s):
            return result
        if s[idx].isdigit():
            return self._remove_non_digits(s, idx + 1, result + s[idx])
        return self._remove_non_digits(s, idx + 1, result)

    def _all_digits(self, s, idx):
        """Recursively check if all characters are digits."""
        if idx >= len(s):
            return True
        if not s[idx].isdigit():
            return False
        return self._all_digits(s, idx + 1)

    def _luhn_check(self, card_number):
        """
        Implement Luhn algorithm recursively.

        Algorithm:
        1. Starting from rightmost digit, double every second digit
        2. If doubled digit > 9, subtract 9
        3. Sum all digits
        4. If sum % 10 == 0, valid
        """
        # Convert to list of integers
        digits = [int(d) for d in card_number]

        # Process from right to left
        total = self._luhn_sum(digits, len(digits) - 1, False, 0)

        return total % 10 == 0

    def _luhn_sum(self, digits, idx, should_double, current_sum):
        """
        Recursively calculate Luhn sum.

        Args:
            digits: List of digit integers
            idx: Current index (processing right to left)
            should_double: Whether to double current digit
            current_sum: Accumulated sum
        """
        # Base case: processed all digits
        if idx < 0:
            return current_sum

        digit = digits[idx]

        if should_double:
            digit *= 2
            if digit > 9:
                digit -= 9

        # Recurse with next digit (moving left)
        return self._luhn_sum(digits, idx - 1, not should_double, current_sum + digit)

# Test credit card validator
card_validator = CreditCardValidator()

card_tests = [
    "4532015112830366",      # Valid Visa
    "6011514433546201",      # Valid Discover
    "378282246310005",       # Valid Amex
    "1234567890123456",      # Invalid (fails Luhn)
    "4532-0151-1283-0366",   # Valid with dashes
    "4532 0151 1283 0366",   # Valid with spaces
    "123",                   # Too short
]

print("\nCredit Card Validation:")
for card in card_tests:
    result = card_validator.validate(card)
    print(f"{card:25} β†’ {result}")

Python Output:

Credit Card Validation:
4532015112830366          β†’ βœ“ Valid card number
6011514433546201          β†’ βœ“ Valid card number
378282246310005           β†’ βœ“ Valid card number
1234567890123456          β†’ βœ— Invalid card number (failed Luhn check)
4532-0151-1283-0366       β†’ βœ“ Valid card number
4532 0151 1283 0366       β†’ βœ“ Valid card number
123                       β†’ βœ— Invalid card number length

Manual Trace: Luhn Algorithm

Let's trace the Luhn check for "1234" by hand:

_luhn_check("1234")
  β†’ digits = [1, 2, 3, 4]
  β†’ _luhn_sum([1,2,3,4], idx=3, should_double=False, sum=0)

_luhn_sum([1,2,3,4], idx=3, should_double=False, sum=0)
  β†’ digit = 4
  β†’ should_double = False, so digit stays 4
  β†’ recurse: _luhn_sum([1,2,3,4], idx=2, should_double=True, sum=4)

_luhn_sum([1,2,3,4], idx=2, should_double=True, sum=4)
  β†’ digit = 3
  β†’ should_double = True, so digit = 3 * 2 = 6
  β†’ 6 <= 9, so digit stays 6
  β†’ recurse: _luhn_sum([1,2,3,4], idx=1, should_double=False, sum=10)

_luhn_sum([1,2,3,4], idx=1, should_double=False, sum=10)
  β†’ digit = 2
  β†’ should_double = False, so digit stays 2
  β†’ recurse: _luhn_sum([1,2,3,4], idx=0, should_double=True, sum=12)

_luhn_sum([1,2,3,4], idx=0, should_double=True, sum=12)
  β†’ digit = 1
  β†’ should_double = True, so digit = 1 * 2 = 2
  β†’ 2 <= 9, so digit stays 2
  β†’ recurse: _luhn_sum([1,2,3,4], idx=-1, should_double=False, sum=14)

_luhn_sum([1,2,3,4], idx=-1, should_double=False, sum=14)
  β†’ idx < 0 β†’ BASE CASE
  β†’ return 14

Back in _luhn_check:
  β†’ total = 14
  β†’ 14 % 10 = 4 (not 0)
  β†’ return False

So "1234" is not a valid card number according to Luhn.

Composable Validators: Putting It All Together

class CompositeValidator:
    """
    Validate multiple formats using composition.
    """
    def __init__(self):
        self.validators = {
            'email': EmailValidator(),
            'phone': PhoneValidator(),
            'url': URLValidator(),
            'card': CreditCardValidator(),
        }

    def validate(self, input_type, value):
        """
        Validate value based on input type.

        Args:
            input_type: One of 'email', 'phone', 'url', 'card'
            value: String to validate
        """
        if input_type not in self.validators:
            return ValidationResult(False, f"Unknown type: {input_type}")

        return self.validators[input_type].validate(value)

    def validate_form(self, form_data):
        """
        Validate entire form recursively.

        Args:
            form_data: Dict mapping field names to (type, value) tuples

        Returns:
            Dict mapping field names to ValidationResult
        """
        return self._validate_fields(list(form_data.items()), 0, {})

    def _validate_fields(self, fields, idx, results):
        """Recursively validate form fields."""
        if idx >= len(fields):
            return results

        field_name, (field_type, field_value) = fields[idx]
        result = self.validate(field_type, field_value)
        results[field_name] = result

        return self._validate_fields(fields, idx + 1, results)

# Test composite validator
composite = CompositeValidator()

# Validate individual values
print("\nComposite Validator - Individual:")
print(f"Email: {composite.validate('email', 'user@example.com')}")
print(f"Phone: {composite.validate('phone', '(123) 456-7890')}")
print(f"URL: {composite.validate('url', 'https://example.com')}")
print(f"Card: {composite.validate('card', '4532015112830366')}")

# Validate entire form
form_data = {
    'email': ('email', 'john.doe@company.com'),
    'phone': ('phone', '123-456-7890'),
    'website': ('url', 'https://john-doe.com'),
    'payment': ('card', '4532-0151-1283-0366'),
}

print("\nComposite Validator - Form:")
results = composite.validate_form(form_data)
for field, result in results.items():
    print(f"{field:10} β†’ {result}")

Python Output:

Composite Validator - Individual:
Email: βœ“ Valid email format
Phone: βœ“ Valid phone format
URL: βœ“ Valid URL format
Card: βœ“ Valid card number

Composite Validator - Form:
email      β†’ βœ“ Valid email format
phone      β†’ βœ“ Valid phone format
website    β†’ βœ“ Valid URL format
payment    β†’ βœ“ Valid card number

Project Summary: What We Built

We created a production-quality validation framework using recursive string processing:

Key Techniques Applied: 1. Structure-first validation (email, URL) 2. Character-by-character processing (all validators) 3. Recursive string transformation (removing spaces, splitting) 4. Recursive algorithm implementation (Luhn check) 5. Composition and reusability (CompositeValidator)

Complexity Analysis:

Validator Time Complexity Space Complexity Call Stack Depth
Email O(n) O(n) O(n)
Phone O(n) O(n) O(n)
URL O(n) O(n) O(n)
Credit Card O(n) O(n) O(n)

All validators are linear time and space, making them suitable for production use.

When Recursion Shines vs. When It Doesn't

Recursion is excellent for: - βœ“ Nested structure validation (parentheses, HTML tags) - βœ“ Pattern matching with backtracking (wildcards, regex) - βœ“ Composable validation logic - βœ“ Clear expression of recursive rules

Recursion is overkill for: - βœ— Simple character-by-character checks (use loops) - βœ— String reversal (use slicing: s[::-1]) - βœ— Finding substrings (use str.find()) - βœ— Splitting strings (use str.split())

The key question: Does the problem have recursive structure (nested, branching, or self-similar), or is it just sequential processing?

For our validators: - Email/phone/URL: Sequential processing β†’ recursion is elegant but not necessary - Wildcard matching: Branching exploration β†’ recursion is the natural fit - Luhn algorithm: Recursive definition β†’ recursion expresses the algorithm clearly

Extension Challenges

Try extending the project:

  1. Add password validator with recursive strength checking
  2. Implement regex-like pattern matching with more wildcards (+, *, [])
  3. Add nested structure validation (balanced parentheses, HTML tags)
  4. Optimize with memoization for repeated validation calls
  5. Add detailed error reporting showing exactly where validation failed

Chapter Summary and Key Takeaways

Chapter Summary and Key Takeaways

The Journey: From Characters to Validators

We started with the fundamental insight that strings are recursive structuresβ€”sequences that can be decomposed into first + rest, just like lists. Through progressive refinement, we built increasingly sophisticated string processing techniques:

Iteration Problem Technique Introduced Key Insight
0 Email validation Character-by-character Naive recursion fails with complex structures
1 Multiple dots in domain Structure-first validation Identify boundaries before processing
2 Palindrome checking Two-pointer recursion Process from both ends simultaneously
3 String reversal Accumulator pattern Build results during recursion, not after
4 Wildcard matching Backtracking recursion Explore multiple possibilities naturally
5 Production validators Composition and reusability Combine recursive patterns into frameworks

Core Patterns Mastered

1. First + Rest Pattern

def process_string(s):
    if len(s) == 0:
        return base_case
    first = s[0]
    rest = s[1:]
    return combine(first, process_string(rest))

When to use: Sequential processing, building/transforming strings

2. Two-Pointer Pattern

def process_string(s, left=0, right=None):
    if right is None:
        right = len(s) - 1
    if left >= right:
        return base_case
    # Process s[left] and s[right]
    return process_string(s, left + 1, right - 1)

When to use: Palindromes, symmetric validation, comparing ends

3. Accumulator Pattern

def process_string(s, index=0, accumulator=""):
    if index >= len(s):
        return accumulator
    # Update accumulator
    return process_string(s, index + 1, new_accumulator)

When to use: Building results, avoiding repeated concatenation

4. Backtracking Pattern

def match_pattern(text, pattern, t_idx=0, p_idx=0):
    # Base cases
    if t_idx == len(text) and p_idx == len(pattern):
        return True

    # Try multiple possibilities
    if pattern[p_idx] == '*':
        # Option A: skip wildcard
        if match_pattern(text, pattern, t_idx, p_idx + 1):
            return True
        # Option B: consume character
        if match_pattern(text, pattern, t_idx + 1, p_idx):
            return True

    # Continue matching
    return match_pattern(text, pattern, t_idx + 1, p_idx + 1)

When to use: Pattern matching, exploring solution spaces

Complexity Characteristics

Time Complexity: - Linear patterns (first+rest, two-pointer, accumulator): O(n) - Backtracking patterns (wildcards): O(2^n) worst case

Space Complexity: - All recursive string functions: O(n) call stack depth - Accumulator with strings: O(nΒ²) if concatenating, O(n) if using lists

Key Optimization: Use list accumulation instead of string concatenation:

# Bad: O(nΒ²) time
def build_string(s, idx, result=""):
    return build_string(s, idx+1, result + s[idx])

# Good: O(n) time
def build_string(s, idx, result=None):
    if result is None:
        result = []
    result.append(s[idx])
    return ''.join(result)  # Only at the end

When to Use Recursion for Strings

Recursion is the right choice when: - βœ“ Problem has nested or branching structure (parentheses, wildcards) - βœ“ Multiple possibilities need exploration (pattern matching) - βœ“ Recursive definition is clearer than iterative (Luhn algorithm) - βœ“ Building composable, reusable validators

Iteration is better when: - βœ— Simple sequential processing (use for loops) - βœ— Built-in methods exist (str.find(), str.split()) - βœ— Performance is critical and recursion adds overhead - βœ— Problem is naturally iterative (counting characters)

Debugging Recursive String Functions

Common failure modes:

  1. Infinite recursion: Forgot base case or base case never reached
  2. Fix: Ensure every recursive call makes progress toward base case

  3. Off-by-one errors: Wrong string slicing boundaries

  4. Fix: Trace execution manually with small input

  5. Incorrect string building: O(nΒ²) concatenation

  6. Fix: Use list accumulator, join at end

  7. Missing edge cases: Empty strings, single characters

  8. Fix: Test with "", "a", "ab" explicitly

Debugging workflow: 1. Add print statements showing parameters at each call 2. Trace execution manually with small input (3-5 characters) 3. Verify base cases are correct and reachable 4. Check that recursive calls make progress

Connection to Future Topics

The patterns we learned here will reappear throughout the course:

Final Thoughts

String recursion teaches us that the right recursive pattern depends on the problem structure:

The key skill is recognizing which pattern fits the problem. With practice, this recognition becomes intuitive.

In the next module, we'll see how these patterns extend to more complex data structures, starting with divide-and-conquer algorithms that split problems in half rather than peeling off one element at a time.